Canadian Birth Probability Analysis¶
By Nicole Bidwell
Introduction¶
This analysis explores the chance of being born in Canada in a particular year. A description of how to reproduce the analysis by running the data pipeline can be found in the README.md file. The outline below serves as summary, explanation, and interpretation of the analysis.
Data¶
The data used for this analysis is the provided birth rate and population data obtained through the World Bank's API.
The Chosen Years¶
This analysis covers the period from 2010 to 2023. An example calculation for the probability of being born in Canada in 2012 is included, along with the changes in likelihood over time across all countries.
Retrieving and Loading the Data¶
The script retrieve_load.py found in the src folder is used to obtain the required data from the World Bank API in a JSON format. This includes pulling data from 2010 to 2023 across multiple pages for both birth rate and population, along with using sqlite3 to create a table, load the data into the database, and querying for the required subset of the data.
When querying for the required subset of data, it is important to filter for valid countries since the original data includes countries grouped in specific regions. Including these regions in the processed data would have resulted in over-counting in the later calculations of total births.
The Processed Data¶
After querying for the required data, a pandas data frame is created which obtains the country information (ISO3 code, id, and name), year, birth rate, and population. This is the main data frame needed to later calculate the number of births, along with the required probabilities. ISO3 code, id, and name were all included for identifying valid countries, joining data frames, and plotting graphs with ease.
Calculated Values¶
The script calculate_probabilities.py in the src folder is used to perform the probability calculations.
Number of Births¶
After loading the data, a column birth is added to the data frame. This provides the number of births in each year for each country, using the formula:
$$\text{Number of Births} = \frac{\text{Birth Rate}}{1000}\times\text{Population}$$
These values are used in the following calculations.
Probability of Being Born in Canada for 2012¶
To calculate the probability of being born in Canada for a specified year, I created the function calc_probability_country. This function calculates the percentage probability of being born in any specified country for any specified year within the dataset. The two formulas used are:
$$\text{Total Worldwide Births in the Year} = \text{sum of all countries' births in the year}$$
$$\text{Percentage Probability for a Country} = \frac{\text{Country's Number of Births in the Year}}{\text{Total Worldwide Births in the Year}}\times 100$$
For calculating the probability of being born in Canada for 2012, the function is called with Canada for the country parameter and 2012 for the year parameter. For a more tangible interpretation, I included the equivalent ratio using the formula:
$$\text{Ratio Value} = \frac{1}{\text{Percentage Probability}}\times100$$
These values are saved in the output folder.
Probability of Being Born in any Specified Country for any Specified Year¶
The calc_probability_country function was also used to calculate the probabilities of being born in all other countries for each year. These values are saved as a CSV file, countries_prob.csv, in the data folder, for easy reference.
Global Average Number of Births per Year¶
The last value I calculated was the global average number of births per year. Obtained using the following formula, this value later provides insight when interpreting the differences in birth probabilities from year to year.
$$\text{Global Average Number of Births per Year} = \frac{\sum{\text{(Total Births per Year)}}}{\text{Total Number of Years}}$$
Results and Interpretation¶
Probability of Being Born in Canada in 2012¶
The probability of being born in Canada in 2012 is $0.261\%$, which indicates that, on average, 1 out of 383.14* people born in 2012 were born in Canada.
*Calculated by $1\div0.261\times100=383.14$
Data Visualization and Interpretation¶
The script graphs.py in the src folder is used to generate plots using Plotly Graph Objects and Plotly Express, which are later saved in the output folder. Each plot has a corresponding function in the script that was used to generate the plot. These plots allow for easier interpretation and deeper analysis of the birth probabilities.
1. Canada Bar Chart for 2012¶
Script function name: canada_bar_chart.
This plot displays the probability of being born in Canada in 2012. It provides a straightforward visual comparison between the probability of being born in Canada and elsewhere in 2012. When hovering over the bars, we can confirm the exact values.
Here we see that the probability of being born in Canada in 2012 appears to be small. Further analysis provides more meaningful insight.
2. Canada Trend Line Over Time¶
Script function name: canada_timeline.
This plot displays the change in probability of being born in Canada from 2010 to 2023.
Here we see the 2012 probability of $0.261\%$ is a minimum value over the period from 2010 to 2023. Notably, the maximum probability is $0.278\%$, which occurred in 2021. This provides the range of $0.071\%$. This difference may seem small, but when we consider the global average number of births per year, calculated to be roughly $140,039,787.23$ people, a $0.071\%$ difference means roughly $99428.24$* more people were born in 2023 compared to 2012.
*Calculated by $0.071/100\times140,039,787.23 = 99428.24$
3. Top 5, Bottom 5, and Canada Timeline¶
Script function name: country_timeline.
Similar to the Canada Trend Line Over Time, this plot includes additional countries' probability trend lines between 2010 and 2013. The included countries are the 5 countries with the highest average probability (India, China, Nigeria, Pakistan, and Indonesia) and the 5 countries with the lowest average probability (Nauru, British Virgin Islands, San Marino, Tuvalu, and Palau), along with Canada for comparison—in descending order in the legend.
The values in the legend can be clicked to better display overlapping trend lines.
While Canada is not one of the lowest 5 countries, it remains closer to the bottom than the top. From this graph, it is also evident that many of the countries appear to have relatively stable birth rate probabilities over the period, except for China and India. In China, we see a downward trend following 2017. In India, we see a slight downwards trend between 2010 to 2014, followed by some stability.
Conclusions¶
Birth rate and population changes play a crucial role in the functioning of society on a wide scale. Whether it’s the impact on the economy or the demand for social services, including public education, changes in these factors can have significant effects on resource allocation and long-term planning.
This analysis discovered that the probability of being born in Canada in 2012 is $0.261\%$ (or 1 in 383.14). It also provided insight into the changes in birth probabilities throughout 2010 to 2023. While interpreting the trend lines, it is evident that the biggest change in birth probabilities in Canada is between 2012 and 2021. This change equated to roughly $99428.24$ more people being born in 2021 compared to 2012. That said, Canada's birth probabilities remained relatively stable compared to countries like China and India, which had the highest birth probabilities but also displayed more fluctuation. The most notable fluctuation was the drop in China's birth probabilities from 2019 onwards. This drop may have been influenced by the impacts of COVID-19.
This analysis could be expanded by extending the timeframe beyond the 2010-2023 period. Observing a wider period would allow for deeper insight and likely more fluctuation present in the trend lines. Understanding population and birth rate patterns, combined with forecasting techniques, could improve planning for social services and public education to better meet dynamic community needs.